20 research outputs found

    Accelerating Large-Scale Data Analysis by Offloading to High-Performance Computing Libraries using Alchemist

    Full text link
    Apache Spark is a popular system aimed at the analysis of large data sets, but recent studies have shown that certain computations---in particular, many linear algebra computations that are the basis for solving common machine learning problems---are significantly slower in Spark than when done using libraries written in a high-performance computing framework such as the Message-Passing Interface (MPI). To remedy this, we introduce Alchemist, a system designed to call MPI-based libraries from Apache Spark. Using Alchemist with Spark helps accelerate linear algebra, machine learning, and related computations, while still retaining the benefits of working within the Spark environment. We discuss the motivation behind the development of Alchemist, and we provide a brief overview of its design and implementation. We also compare the performances of pure Spark implementations with those of Spark implementations that leverage MPI-based codes via Alchemist. To do so, we use data science case studies: a large-scale application of the conjugate gradient method to solve very large linear systems arising in a speech classification problem, where we see an improvement of an order of magnitude; and the truncated singular value decomposition (SVD) of a 400GB three-dimensional ocean temperature data set, where we see a speedup of up to 7.9x. We also illustrate that the truncated SVD computation is easily scalable to terabyte-sized data by applying it to data sets of sizes up to 17.6TB.Comment: Accepted for publication in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, London, UK, 201

    Accelerating Large-Scale Data Analysis by Offloading to High-Performance Computing Libraries using Alchemist

    Get PDF
    Apache Spark is a popular system aimed at the analysis of large data sets, but recent studies have shown that certain computations---in particular, many linear algebra computations that are the basis for solving common machine learning problems---are significantly slower in Spark than when done using libraries written in a high-performance computing framework such as the Message-Passing Interface (MPI). To remedy this, we introduce Alchemist, a system designed to call MPI-based libraries from Apache Spark. Using Alchemist with Spark helps accelerate linear algebra, machine learning, and related computations, while still retaining the benefits of working within the Spark environment. We discuss the motivation behind the development of Alchemist, and we provide a brief overview of its design and implementation. We also compare the performances of pure Spark implementations with those of Spark implementations that leverage MPI-based codes via Alchemist. To do so, we use data science case studies: a large-scale application of the conjugate gradient method to solve very large linear systems arising in a speech classification problem, where we see an improvement of an order of magnitude; and the truncated singular value decomposition (SVD) of a 400GB three-dimensional ocean temperature data set, where we see a speedup of up to 7.9x. We also illustrate that the truncated SVD computation is easily scalable to terabyte-sized data by applying it to data sets of sizes up to 17.6TB.Comment: Accepted for publication in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, London, UK, 201

    Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

    Full text link
    We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausability), PCA (for its ubiquity) and CX (for data interpretability). We apply these methods to TB-sized problems in particle physics, climate modeling and bioimaging. The data matrices are tall-and-skinny which enable the algorithms to map conveniently into Spark's data-parallel model. We perform scaling experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide tuning guidance to obtain high performance

    Dynamic Analyses of Result Quality in Energy-Aware Approximate Programs

    No full text
    Thesis (Ph.D.)--University of Washington, 2014Energy efficiency is a key concern in the design of modern computer systems. One promising approach to energy-efficient computation, approximate computing, trades off output precision for energy efficiency. However, this tradeoff can have unexpected effects on computation quality. This thesis presents dynamic analysis tools to study, debug, and monitor the quality and energy efficiency of approximate computations. We propose three styles of tools: prototyping tools that allow developers to experiment with approximation in their applications, offline tools that instrument code to determine the key sources of error, and online tools that monitor the quality of deployed applications in real time. Our prototyping tool is based on an extension to the functional language OCaml. We add approximation constructs to the language, an approximation simulator to the runtime, and profiling and auto-tuning tools for studying and experimenting with energy-quality tradeoffs. We also present two offline debugging tools and three online monitoring tools. The first offline tool identifies correlations between output quality and the total number of executions of, and errors in, individual approximate operations. The second tracks the number of approximate operations that flow into a particular value. Our online tools comprise three low-cost approaches to dynamic quality monitoring. They are designed to monitor quality in deployed applications without spending more energy than is saved by approximation. Online monitors can be used to perform real time adjustments to energy usage in order to meet specific quality goals. We present prototype implementations of all of these tools and describe their usage with several applications. Our prototyping, profiling, and autotuning tools allow us to experiment with approximation strategies and identify new strategies, our offline tools succeed in providing new insights into the effects of approximation on output quality, and our monitors succeed in controlling output quality while still maintaining significant energy efficiency gains

    Applying the Vector Radix Method to Multidimensional, Multiprocessor, Out-of-Core Fast Fourier Transforms

    No full text
    We describe an efficient algorithm for calculating Fast Fourier Transforms on matrices of arbitrarily high dimension using the vector-radix method when the problem size is out-of-core (i.e., when the size of the data set is larger than the total available memory of the system). The algorithm takes advantage of multiple processors when they are present, but it is also efficient on single-processor systems. Our work is an extension of work done by Lauren Baptist in [Bapt99], which applied the vector-radix method to 2-dimensional out-of-core matrices. To determine the effectiveness of the algorithm, we present empirical results as well as an analysis of the I/O, communication, and computational complexity. We perform the empirical tests on a DEC 2100 server and on a cluster of Pentium-based Linux workstations. We compare our results with the traditional dimensional method of calculating multidimensional FFTs, and show that as the number of dimensions increases, the vector-radix-based algorithm becomes increasingly effective relative to the dimensional method. In order to calculate the complexity of the algorithm, it was necessary to develop a method for analyzing the interprocessor communication costs of the BMMC data-permutation algorithm (presented in [CSW98]) used by our FFT algorithms. We present this analysis method and show how it was derived

    Preventing Format-String Attacks via Automatic and Efficient Dynamic Checking

    No full text
    We propose preventing format-string attacks with a combination of static dataflow analysis and dynamic white-lists of safe address ranges. The dynamic nature of our white-lists provides the flexibility necessary to encode a very precise security policy—namely, that %n-specifiers in printf-style functions should modify a memory location x only if the programmer explicitly passes a pointer to x. Our static dataflow analysis and source transformations let us automatically maintain and check the white-list without any programmer effort—they merely need to change the Makefile. Our analysis also detects pointers passed to vprintfstyle functions through (possibly multiple layers of) wrapper functions. Our results establish that our approach provides better protection than previous work and incurs little performance overhead

    Type Safety and Erasure Proofs for “A Type System for Coordinated Data Structures”

    No full text
    We prove the Type Safety and Erasure Theorems presented in Section 4 of Ringenburg and Grossman’s paper “A Type System for Coordinated Data Structures ” [1]. We also remind the reader of the syntax, semantics, and typing rules for the coordinated list language described in Section 3 of the same paper. We refer the reader to the original paper for a detailed presentation of the coordinated data structure type system. 1 The Language Figures 1, 2, and 3 present, respectively, the syntax, semantics, and typing rules for our coordinated list language. We implicitly assume ∆ and Γ do not have repeated elements. For example, ∆, α:Îș is ill-formed if α ∈ Dom(∆). To avoid conflicts, we can systematically rename constructs with binding occurrences. We therefore treat ∆ and Γ as partial functions. All explicit occurrences of α and x in the grammar are binding (except when they constitute the entire type or expression, of course). Substitution is defined as usual
    corecore